preference constraint
Reviews: Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints
This paper formalizes the problem of inverse reinforcement learning in which the learner's goal is not only to imitate the teacher's demonstration, but also to satisfy her own preferences and constraints. It analyzes the suboptimality of learner-agnostic teaching, where the teacher gives demonstrations without considering the learner's preferences. It then proposes a learner-aware teaching algorithm, where the teacher selects demonstrations while accounting for the learner's preferences. It considers different types of learner models with hard or soft preference constraints. It also develops learner-aware teaching methods for both cases where the teacher has full knowledge of the learner's constraints or does not know it.
Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints
Tschiatschek, Sebastian, Ghosh, Ahana, Haug, Luis, Devidze, Rati, Singla, Adish
Inverse reinforcement learning (IRL) enables an agent to learn complex behavior by observing demonstrations from a (near-)optimal policy. The typical assumption is that the learner's goal is to match the teacher's demonstrated behavior. In this paper, we consider the setting where the learner has her own preferences that she additionally takes into consideration. These preferences can for example capture behavioral biases, mismatched worldviews, or physical constraints. We study two teaching approaches: learner-agnostic teaching, where the teacher provides demonstrations from an optimal policy ignoring the learner's preferences, and learner-aware teaching, where the teacher accounts for the learner's preferences. We design learner-aware teaching algorithms and show that significant performance improvements can be achieved over learner-agnostic teaching.
Optimal Symbolic Planning with Action Costs and Preferences
Edelkamp, Stefan (University of Bremen) | Kissmann, Peter (TU Dortmund)
This paper studies the solving of finite-domain action planning problems with discrete action costs and soft constraints. For sequential optimal planning, a symbolic perimeter database heuristic is addressed in a bucket implementation of A*. For computing net-benefits, we propose symbolic branch-and-bound search together with some search refinements. The net-benefit we optimize is the total benefit of satisfying the goals, minus the total action cost to achieve them. This results in an objective function to be minimized that is a linear expression over the violation of the preferences added to the action cost total.
Learning Preferences for Multiclass Problems
Aiolli, Fabio, Sperduti, Alessandro
Many interesting multiclass problems can be cast in the general framework of label ranking defined on a given set of classes. The evaluation for such a ranking is generally given in terms of the number of violated order constraints between classes. In this paper, we propose the Preference Learning Model as a unifying framework to model and solve a large class of multiclass problems in a large margin perspective. In addition, an original kernel-based method is proposed and evaluated on a ranking dataset with state-of-the-art results.
Learning Preferences for Multiclass Problems
Aiolli, Fabio, Sperduti, Alessandro
Many interesting multiclass problems can be cast in the general framework of label ranking defined on a given set of classes. The evaluation for such a ranking is generally given in terms of the number of violated order constraints between classes. In this paper, we propose the Preference Learning Model as a unifying framework to model and solve a large class of multiclass problems in a large margin perspective. In addition, an original kernel-based method is proposed and evaluated on a ranking dataset with state-of-the-art results.
Learning Preferences for Multiclass Problems
Aiolli, Fabio, Sperduti, Alessandro
Many interesting multiclass problems can be cast in the general framework oflabel ranking defined on a given set of classes. The evaluation for such a ranking is generally given in terms of the number of violated order constraints between classes. In this paper, we propose the Preference LearningModel as a unifying framework to model and solve a large class of multiclass problems in a large margin perspective. In addition, an original kernel-based method is proposed and evaluated on a ranking dataset with state-of-the-art results.